-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADR - content hashing caching strategy for business rules. #650
base: master
Are you sure you want to change the base?
ADR - content hashing caching strategy for business rules. #650
Conversation
7fc822e
to
9a5d21b
Compare
|
||
After running the business rules it is useful to store the results of checks against models, and have a mechanism to check if the result is still valid to avoid re-running the check in future. | ||
|
||
This ADR adds a mechanims to checksum the user-editable data to check if to versions of a Tracked Model are equivilent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm uncertain whether there are fields that can be mutated but are not user editable. If there is a difference, and editable fields are a subset of mutable fields, then would mutable be a preferable term here?
Tiny typo "... check if to versions".
Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name. | ||
|
||
|
||
The hash generated in this ADR will not currently be attached to TrackedModel - thought it may be desirable in future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attached == cached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saved to it's own field I guess.
The fields to be hashed are contained in `copyable_fields` but since this doesn't communicate the intent used here, this will be proxied as `mutable_fields`, which is explicitly about providing which fields can be hashed to check equivilence. | ||
|
||
|
||
TrackedModel will gain an API named get_content_hash() that returns a sha256 hash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it only matters that it's a sufficiently unique hash.
|
||
This provides a mechanism to store the results of business rule checks, only becomming invalid when the TrackedModels they reference change. | ||
|
||
When a workbasket is published, the validity of all other unpublished workbaskets must be verified - with this system, most of the business rule checks would remain valid and the only the checksums need to be verified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... only the checksums need to be verified.
May be clearer to say "... only the content hash needs to be verified."?
Context | ||
------- | ||
|
||
After running the business rules it is useful to store the results of checks against models, and have a mechanism to check if the result is still valid to avoid re-running the check in future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about what can invalidate a check after it has completed. We know that after a check has been run, then newly published tariff data can cause some types of check to become invalid - an ME32 check against a Measure, for instance. So hashing doesn't help avoid the need to run a check again in those circumstances.
It would be worth pointing out that caveat here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is really important - I've updated the PR description to cover this, and will update the ADR to reflect this (date validity checkers will be their own adr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool stuff! I would love an example involving a current business rule, if at all possible, just so that I can visualise it / walk it through in my head, but looks great apart from that. Please excuse the pedantic spelling comments...
|
||
TAP already has a mechanism to get the user editable fields, the attribute `.copyable_fields`, which provides a starting point. | ||
|
||
Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to give an example where two different types produce the same hash? It's probably trivial, but I have Monday brain
The fields to be hashed are contained in `copyable_fields` but since this doesn't communicate the intent used here, this will be proxied as `mutable_fields`, which is explicitly about providing which fields can be hashed to check equivilence. | ||
|
||
|
||
TrackedModel will gain an API named get_content_hash() that returns a sha256 hash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to be more specfic than "API"? Would it be a class_method? As I read it currently, it sounds a bit like the hash will be attached to the tracked model, which contradicts what's written on L28. I know that's not the intent, but just be good to clarify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I guess I meant a method - e.g. .content_hash(), vs something we save forever in a field.
Consequences | ||
------------ | ||
|
||
This provides a mechanism to store the results of business rule checks, only becomming invalid when the TrackedModels they reference change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will part of this work involve creating a list of rules to which this kind of caching will be applicable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, though it doesn't need to be exhaustive - as it can be extended if needed later.
|
||
After running the business rules it is useful to store the results of checks against models, and have a mechanism to check if the result is still valid to avoid re-running the check in future. | ||
|
||
This ADR adds a mechanims to checksum the user-editable data to check if to versions of a Tracked Model are equivilent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--> mechanism
--> equivalent
|
||
This ADR adds a mechanims to checksum the user-editable data to check if to versions of a Tracked Model are equivilent. | ||
|
||
In the simplest sense this means hashing fields that exclude the PK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's meant by "fields that exclude the PK" ?
|
||
TAP already has a mechanism to get the user editable fields, the attribute `.copyable_fields`, which provides a starting point. | ||
|
||
Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--> necessary
|
||
TAP already has a mechanism to get the user editable fields, the attribute `.copyable_fields`, which provides a starting point. | ||
|
||
Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth adding a link to docs on python's default hashing?
Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name. | ||
|
||
|
||
The hash generated in this ADR will not currently be attached to TrackedModel - thought it may be desirable in future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--> though
Consequences | ||
------------ | ||
|
||
This provides a mechanism to store the results of business rule checks, only becomming invalid when the TrackedModels they reference change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--> becoming
When a workbasket is published it may clash with any of the unpublished workbaskets - however there are a class of business rules whose results should still be valid if the content of the model they reference hasn't changed.
This ADR proposes a content checksum mechanism for models, that checksum can be saved and used to check if a model has changed.
The content checksum only covers user editable fields.